A Link Clustering Based Approach for Clustering Categorical Data
نویسندگان
چکیده
$EVWUDFW &DWHJRULFDO GDWD FOXVWHULQJ &'& DQG OLQN FOXVWHULQJ /& KDYH EHHQ FRQVLGHUHG DV VHSDUDWH UHVHDUFK DQG DSSOLFDWLRQ DUHDV 7KH PDLQ IRFXV RI WKLV SDSHU LV WR LQYHVWLJDWH WKH FRPPRQDOLWLHV EHWZHHQ WKHVH WZR SUREOHPV DQG WKH XVHV RI WKHVH FRPPRQDOLWLHV IRU WKH FUHDWLRQ RI QHZ FOXVWHULQJ DOJRULWKPV IRU FDWHJRULFDO GDWD EDVHG RQ FURVV IHUWLOL]DWLRQ EHWZHHQ WKH WZR GLVMRLQW UHVHDUFK ILHOGV 0RUH SUHFLVHO\ ZH IRUPDOO\ WUDQVIRUP WKH &'& SUREOHP LQWR DQ /& SUREOHP DQG DSSO\ /& DSSURDFK IRU FOXVWHULQJ FDWHJRULFDO GDWD ([SHULPHQWDO UHVXOWV RQ UHDO GDWDVHWV VKRZ WKDW /& EDVHG FOXVWHULQJ PHWKRG LV FRPSHWLWLYH ZLWK H[LVWLQJ &'& DOJRULWKPV ZLWK UHVSHFW WR FOXVWHULQJ DFFXUDF\
منابع مشابه
Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها
Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...
متن کاملA Link-Based Cluster Collection Approach Combined Contagious Cluster With For Categorical Data Clustering
Data clustering is a challenging task in data mining technique. Various clustering algorithms are developed to cluster or categorize the datasets. Many algorithms are used to cluster the categorical data. Some algorithms cannot be directly applied for clustering of categorical data. Several attempts have been made to solve the problem of clustering categorical data via cluster ensembles. But th...
متن کاملخوشهبندی خودکار دادههای مختلط با استفاده از الگوریتم ژنتیک
In the real world clustering problems, it is often encountered to perform cluster analysis on data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. In addition, traditional methods, for example, the K-means algorithm, usually ask the user to provide the number of clusters. In this...
متن کاملA Clustering Approach by SSPCO Optimization Algorithm Based on Chaotic Initial Population
Assigning a set of objects to groups such that objects in one group or cluster are more similar to each other than the other clusters’ objects is the main task of clustering analysis. SSPCO optimization algorithm is anew optimization algorithm that is inspired by the behavior of a type of bird called see-see partridge. One of the things that smart algorithms are applied to solve is the problem ...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/cs/0412019 شماره
صفحات -
تاریخ انتشار 2004